Skip to content

refactor(scan): extract shared pipeline helpers into tasks/_scan_pipeline#405

Merged
haksungjang merged 1 commit into
mainfrom
feat/sbom-ingest-pipeline-helpers
Jun 13, 2026
Merged

refactor(scan): extract shared pipeline helpers into tasks/_scan_pipeline#405
haksungjang merged 1 commit into
mainfrom
feat/sbom-ingest-pipeline-helpers

Conversation

@haksungjang

Copy link
Copy Markdown
Contributor

What

Behavior-preserving extraction of the source-scan pipeline's self-contained orchestration helpers into a new tasks/_scan_pipeline.py, so the upcoming external SBOM-ingest Celery task can reuse them through a clean public seam — without from tasks.scan_source import _private_name cross-module reach.

Prereq refactor for the SBOM ingest feature (#404 added the sbom scan kind; the ingest endpoint/task land next).

Changes

  • New tasks/_scan_pipeline.py with public mark_failed, record_terminal_failure, mark_succeeded (moved verbatim) and a generalized set_stage(scan_uuid, stage, percent).
  • set_stage takes percent explicitly instead of reaching into the source-only _STAGE_PROGRESS. scan_source._set_stage stays as a thin wrapper passing _STAGE_PROGRESS.get(stage)None keeps the row's prior percent, matching the original .get(stage, prior) fallback exactly.
  • scan_source keeps thin _mark_* / _set_stage aliases so its own call sites and the existing monkeypatch.setattr(scan_source, "_record_terminal_failure", …) tests stay unchanged.
  • _persist_components → public persist_sbom_components (in-place rename; the ~700-line sub-helper cluster stays private, not relocated).
  • Tests repointed where the implementation moved (progress-hook monkeypatch targets → tasks._scan_pipeline).

Behavior preservation

  • Status transitions, completed_at, progress_percent or 0 snapshot, supersede_prior_ref_scans call, succeeded percent=100, succeeded/failed step strings, commit-then-publish ordering, and the scan_stage log event/fields are byte-identical.
  • set_stage: known stage → mapped int (DB+log+publish); unknown stage → percent=None → prior percent kept, log carries None — identical to the original both paths.
  • make_line_callback was already public in tasks/_progress (prior extraction), so it is reused there, not duplicated.

Scope notes

  • No DB/API/frontend change.
  • scan_container.py has its own copies of these helpers — out of scope here, a natural follow-up consumer of the shared module (its _STAGE_PROGRESS differs).

Verification

  • mypy . (full, 442 files): clean.
  • ruff check on changed files: clean.
  • Backend pytest needs docker/redis (can't run locally) — CI test(backend) is the gate; behavior preservation argued above.

…line

Pull the self-contained terminal-state writers and the per-stage progress
writer out of scan_source into a new tasks/_scan_pipeline module, so the
upcoming SBOM-ingest Celery task can reuse them through a public seam
instead of reaching into a sibling task module's privates.

Behaviour-preserving — no functional change to the source scan:
- mark_failed / record_terminal_failure / mark_succeeded moved verbatim
- set_stage generalised to take an explicit `percent` (the source pipeline
  still owns _STAGE_PROGRESS and passes .get(stage); None keeps the prior
  percent, matching the original .get(stage, prior) fallback exactly)
- scan_source keeps thin _mark_*/_set_stage aliases so its own call sites
  and the monkeypatch-based tests stay unchanged
- _persist_components renamed to public persist_sbom_components in place
  (the ~700-line sub-helper cluster stays private, not relocated)

make_line_callback was already public in tasks/_progress (extracted in a
prior PR), so it is reused from there, not duplicated. No import cycle.
@haksungjang haksungjang merged commit 0bc5fd9 into main Jun 13, 2026
28 of 29 checks passed
@haksungjang haksungjang deleted the feat/sbom-ingest-pipeline-helpers branch June 13, 2026 15:38
haksungjang added a commit that referenced this pull request Jun 13, 2026
image-scan kept HARD-failing on lodash 4.17.19 (CVE-2021-23337, CVE-2026-4800)
and minimist 1.2.5 (CVE-2021-44906) even after the cdxgen 12.3.3→12.5.1 bump,
which only rebuilt the cdxgen layer. A fresh local install of cdxgen 12.5.1
and of npm 11.14.1 — the image's only two npm-package installers — pulls
neither package, and these CVEs were never in .trivyignore, yet image-scan
passed on #404/#405. The vulnerable copies therefore live in a stale, earlier
`scope=worker` cache layer (a non-deterministic npm-install resolution cached
long ago), not in anything the current Dockerfile produces.

Bumping the buildx GHA cache scope (worker → worker-v2) abandons the poisoned
cache and forces a single clean rebuild; the new namespace caches the clean
tree. Keeps the cdxgen 12.5.1 bump (latest 12.x, verified lodash/minimist-free).
haksungjang added a commit that referenced this pull request Jun 14, 2026
…rker image-scan (#407)

A no-cache linux/amd64 rebuild of the worker image (image-scan gate) HARD-fails
on three node-pkg findings vendored under cdxgen's global install tree:
  - lodash 4.17.19   CVE-2021-23337 (HIGH), CVE-2026-4800 (HIGH)
  - minimist 1.2.5   CVE-2021-44906 (CRITICAL)

These are pulled by a platform-gated (cpu=x64/os=linux) transitive of cdxgen's
dependency graph: a fresh `npm install -g @cyclonedx/cdxgen@12.3.3` on
linux/amd64 resolves them, while the same install on arm64/macOS resolves
neither — so they were masked by the cached worker layer (image-scan passed on
#404/#405) and surfaced only once that GHA cache evicted and CI did a clean
amd64 rebuild. It is a pre-existing, main-wide latent issue, unrelated to any
one feature PR.

Add .trivyignore entries following the file's policy (CVE + target + CVSS +
reach analysis + re-evaluate date). All three are UNREACHED: cdxgen is invoked
only for dependency enumeration with a fixed argv, never calls lodash.template
on scanned-repo input, and the worker never invokes lodash/minimist directly.
Re-evaluate when cdxgen ships a fixed vendored tree.
haksungjang added a commit that referenced this pull request Jun 14, 2026
* feat(scan): external CycloneDX SBOM ingest endpoint

Add POST /v1/projects/{id}/sbom-ingest so external tools (CI, cdxgen-based
scanners) can upload an already-generated CycloneDX SBOM; TRUSCA runs the
back half of the scan pipeline against it — persist components → trivy sbom
matching → findings — reusing the Scan model so ingested scans get ref-keyed
retention, the per-project active-scan guard, and the existing
Components/Vulnerabilities/Licenses UI and build gate for free.

This is NOT a Dependency-Track compatible surface: it is a TRUSCA-native
endpoint (Authorization: Bearer, field `sbom`, no autoCreate), not DT's
/api/v1/bom + X-Api-Key.

Endpoint / service (services/sbom_ingest_service.py, api/v1/sbom.py):
- multipart sbom + ref + release; 202 ScanPublic (kind="sbom").
- require_role_or_api_key("developer"); project-scoped key must match.
- Reuses trigger_scan's guards via an extracted prepare_scan_target
  (existence/team 404/403 before archived 409 / cap 429 — authz before state).
- Synchronous adversarial validation of untrusted input: bounded read
  (SBOM_INGEST_MAX_BYTES, 32 MiB → 413), content-type/filename allow-list
  (415), JSON + CycloneDX structure whitelist (422), component cap
  (SBOM_INGEST_MAX_COMPONENTS, 50k → 422), and an O(n) string-aware byte
  nesting-depth pre-check so a deeply nested document is a clean 422 instead
  of a RecursionError → 500 from json.loads. RFC 7807 throughout.
- Atomic: flush wins the active-scan race before the file is written; a 409
  loser writes no file; commit-race deletes the file; enqueue failure → 503.

Celery task (tasks/ingest_sbom.py, enqueue branch + include):
- ingest_sbom_task reuses persist_sbom_components → run_trivy_sbom →
  persist_trivy_findings → mark_succeeded (ref-keyed supersede). Preserves the
  uploaded SBOM as a durable sbom_cyclonedx ScanArtifact for the signature
  surface; containment-guards the path under workspace_root().

Security (Producer-Reviewer findings addressed):
- bind_audit_team before the scan INSERT so the audit row carries team_id.
- disk-write failure → 503 SbomIngestStorageError (retryable), not 422.
- release / original_filename length-capped + control-byte stripped.

Tests: pure adversarial validator unit suite (incl. depth-bomb regression),
endpoint permission×state matrix + new existence-hide-state 409 rows,
realistic multi-CVE fixture pipeline test. Docs: EN/KO ci-integration/sbom-upload.

* test(scan): regenerate OpenAPI snapshot for sbom-ingest endpoint

The OpenAPI contract snapshot test (test_openapi_no_drift) flagged the new
POST /v1/projects/{project_id}/sbom-ingest path. Add it to the committed
snapshot — path param project_id only (sbom/ref/release are requestBody).

* fix(worker): bump cdxgen 12.3.3 → 12.5.1 to bust stale image-scan node-pkg layer

image-scan (worker) HARD-failed on 3 node-pkg findings — lodash 4.17.19
(CVE-2021-23337, CVE-2026-4800) and minimist 1.2.5 (CVE-2021-44906) — that
live under @cyclonedx/cdxgen/node_modules. Reproduction in node:20-bookworm
shows cdxgen 11.x bundles both, while 12.3.3 AND 12.5.1 ship neither: a clean
build already lacks them, so the failure was a stale type=gha scope=worker
cache layer serving the pre-12.x install tree (same class as the earlier
php-symfony image-scan incident).

Bumping the version interpolated into the global npm install changes that
layer's cache key, forcing a fresh (clean) install — root-cause removal, not
a .trivyignore suppression (suppressing a package absent from a clean build
would wrongly mute a future regression). cdxgen invocation is unchanged across
12.3.3→12.5.1 and engines.node still allows ^20, so no scan regression. Fixes
main too (shared cache) once merged.

* fix(ci): bump worker image-scan GHA cache scope to force a clean rebuild

image-scan kept HARD-failing on lodash 4.17.19 (CVE-2021-23337, CVE-2026-4800)
and minimist 1.2.5 (CVE-2021-44906) even after the cdxgen 12.3.3→12.5.1 bump,
which only rebuilt the cdxgen layer. A fresh local install of cdxgen 12.5.1
and of npm 11.14.1 — the image's only two npm-package installers — pulls
neither package, and these CVEs were never in .trivyignore, yet image-scan
passed on #404/#405. The vulnerable copies therefore live in a stale, earlier
`scope=worker` cache layer (a non-deterministic npm-install resolution cached
long ago), not in anything the current Dockerfile produces.

Bumping the buildx GHA cache scope (worker → worker-v2) abandons the poisoned
cache and forces a single clean rebuild; the new namespace caches the clean
tree. Keeps the cdxgen 12.5.1 bump (latest 12.x, verified lodash/minimist-free).

* Revert "fix(ci): bump worker image-scan GHA cache scope to force a clean rebuild"

This reverts commit a17e5fa.

* Revert "fix(worker): bump cdxgen 12.3.3 → 12.5.1 to bust stale image-scan node-pkg layer"

This reverts commit 20a3040.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant